skip to main content


Search for: All records

Creators/Authors contains: "Nikolic, Borivoje"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available June 17, 2024
  2. Free, publicly-accessible full text available June 17, 2024
  3. null (Ed.)
    The design of computing systems has changed dramatically over the past decade, but most courses in advanced computer architecture remain unchanged. Computer architecture education lies at the intersection between computer science and electrical engineering, with practical exercises in classes based on appropriate levels of abstraction in the computing system design stack. Hardware-centric lab exercises often require broad infrastructure resources and tend to navigate around tedious practical implementation concepts, while software-centric exercises leave a gap between modeling and system implementation implications that students later need to overcome in professional settings. Vertical integration trends in domain-specific compute systems, as well as software-hardware co-design, are often covered in classroom lectures, but are not reflected in laboratory exercises due to complex tooling and simulation infrastructure. We describe our experiences with a joint hardware-software approach to exploring computer architecture concepts in class exercises, by using opensource processor hardware implementations, generator-based hardware design methodologies, and cloud-hosted FPGAs. This approach further enables scaling course enrollment, remote learning and a cross-class collaborative lab ecosystem, creating a connecting thread between computer science and electrical engineering experience-based curricula. 
    more » « less
  4. null (Ed.)
    Wireless networks at millimeter wavelengths have significant implementation difficulties. The path loss at these frequencies naturally leads us to consider antenna arrays with many elements. In these arrays, local oscillator (LO) generation is particularly challenging since the LO specifications affect the system architecture, signal processing design, and circuit implementation. We thoroughly analyze the effect of LO ar- chitecture design choices on the performance of a mm-wave massive MIMO uplink. This investigation focuses on the tradeoffs involved in centralized and distributed LO generation, correlated and uncorrelated phase noise sources, and the bandwidths of PLLs and carrier recovery loops. We show that, from both a performance and implementation complexity standpoint, the op- timal LO architecture uses several distributed subarrays locked to a single intermediate-frequency reference in the low GHz range. Additionally, we show that the choice of PLL and carrier recovery loop bandwidths strongly affects the performance; for typical system parameters, loop bandwidths on the order of tens of MHz achieve SINRs suitable for high-order constellations. Finally, we present system simulations incorporating a complete model of the LO generation system and consider the case of a 128-element array with 16x-spatial multiplexing and a 2 GHz channel bandwidth at 75 GHz carrier. Using our optimization procedure we show that the system can support 16-way spatial multiplexing with 64-QAM modulation. 
    more » « less
  5. null (Ed.)
  6. We present FireSim, an open-source simulation platform that enables cycle-exact microarchitectural simulation of large scale-out clusters by combining FPGA-accelerated simulation of silicon-proven RTL designs with a scalable, distributed network simulation. Unlike prior FPGA-accelerated simulation tools, FireSim runs on Amazon EC2 F1, a public cloud FPGA platform, which greatly improves usability, provides elasticity, and lowers the cost of large-scale FPGA-based experiments. We describe the design and implementation of FireSim and show how it can provide sufficient performance to run modern applications at scale, to enable true hardware-software co-design. As an example, we demonstrate automatically generating and deploying a target cluster of 1,024 3.2 GHz quad-core server nodes, each with 16 GB of DRAM, interconnected by a 200 Gbit/s network with 2 microsecond latency, which simulates at a 3.4 MHz processor clock rate (less than 1,000x slowdown over real-time). In aggregate, this FireSim instantiation simulates 4,096 cores and 16 TB of memory, runs ~ 14 billion instructions per second, and harnesses 12.8 million dollars worth of FPGAs-at a total cost of only ~ $100 per simulation hour to the user. We present several examples to show how FireSim can be used to explore various research directions in warehouse-scale machine design, including modeling networks with high-bandwidth and low-latency, integrating arbitrary RTL designs for a variety of commodity and specialized datacenter nodes, and modeling a variety of datacenter organizations, as well as reusing the scale-out FireSim infrastructure to enable fast, massively parallel cycle-exact single-node microarchitectural experimentation. 
    more » « less